6.3 HATS HA Problem-Alerting System


HATS HA provides a problem-alerting system that sends out alarms to designated people about any failure condition in
a network. To use HATS HA problem-alerting system, you need to write an executible file to tell HATS HA
problem-alerting system when, where, and how to send the alarm. For example, when a standby machine takes over
a specified job, you probably want to notify the system administrator. Then you should make this clear in your
executible file.

HATS HA problem-alerting system operates through the <job>_start and <job>_stop scripts as described previously.
Each time HATS HA starts to service a job, it needs to execute the <job>_start script, and it executes the <job>_stop
script when the service stops. While executing the <job>_start and <job>_stop scripts, HATS HA sends two
parameters to them: a. which machine (the originally designated service machine or the originally designated standby
machine) has started or stopped the service of a specified job; b. the reason why a machine has started or stopped the
service of a specified job. The second parameter is most useful in the <job>_stop script.

HATS HA problem-alerting system allows you to choose your own way of sending the alarm. For example, you can
choose to be called on a pager, to be notified through e-mail, or to view a message on your machine's monitor.

  1. Example of problem-alerting executible file to call a pager.

    To help you write an executible file to use HATS HA paging system, we have provided an exemplary
    file called page_adm. You can find this file in the /usr/HA/ha_file directory. The following exhibits what the
    actual file looks like:

    direct="/dev/term/b"
    phone="6122211661,,,"
    code="428742"
    phone_end="#"

    start_stop=$1
    job_type=$2
    reason=$3

    if [ $start_stop = 0 ]; then
    ..........if [ $job_type = 1 ]; then
    ..........# The service is shutdown, so we page.
    ...............echo atdt$phone$code$phone_end > $direct &
    ..........fi
    else
    ..........if [ $job_type = 0 ]; then
    ..........# The backup job is starting, so we page.
    ...............echo atdt$phone$code$phone_end > $direct &
    ..........fi

    fi

    In this example, you want HATS HA paging system to dial (612) 221-1661 when there is a problem. And
    code used to alarm about a failure condition in your network is 428742 (HATSHA). Of course, you can
    always modify how you want to be notified, depending on the kind of pager you carry.

  2. Merging the problem-alerting executible file with the <job>_start and <job>_stop scripts in HATS HA.

    In order to have the <job>_start and <job>_stop scripts execute your problem-alerting executible file, you need
    to merge them.The following examples show you a <job>_start script without the problem-alerting executible
    file and one with problem-alerting executible file. By comparing the two, you can easily understand how they
    should be merged.
    1. Example of <job>_start script without the problem-alerting executible file:

      #! /bin/sh

      if [ $1 = 1 ]
      then
      ..........echo Service stop reason $2
      else
      ..........echo Standby stop reason $2
      fi

      su oracle -c "/oracle/db_stop.sh"
      PATH=/oracle/bin:$PATH
      export PATH
      ORACLE_HOME=/oracle
      exportORACLE_HOME
      ORACLE_SID=db
      export ORACLE_SID
      tcpctl stop

      exit 0

        In this example, the <job>_start script is edited to start an Oracle job, but has not been told to notify anyone about any problem. Since this is for an Oracle job, the file name is /usr/HA/ha_file/oracle_start.

    2. Example of <job>_start script with the problem-alerting executible file:

      #! /bin/sh

      if [ $1 = 1 ]
      then
      ..........echo Service stop reason $2
      .........else echo Standby stop reason $2
      .......... #invoke page_adm and pass 0(stop) to it
      ........../usr/HA/ha_file/page_adm 0 $1 $2fi

      su oracle -c "/oracle/db_stop.sh"
      PATH=/oracle/bin:$PATH
      export PATH
      ORACLE_HOME=/oracle
      exportORACLE_HOME
      ORACLE_SID=db
      export ORACLE_SID
      tcpctl stop

      exit 0


Table of ContentsSec. 6-1 Sec. 6-2Sec. 6- 3Sec. 6-4Sec. 6- 5Sec. 6-6